fitting set
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
We consider the problem of obtaining dense 3D reconstructions of deformable objects from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we aim at recovering several plausible reconstructions compatible with the input data. We suggest that ambiguities can be modeled more effectively by parametrizing the possible body shapes and poses via a suitable 3D model, such as SMPL for humans. We propose to learn a multi-hypothesis neural network regressor using a best-of-M loss, where each of the M hypotheses is constrained to lie on a manifold of plausible human poses by means of a generative model. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans, and in heavily occluded versions of these benchmarks.
Review for NeurIPS paper: 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
Weaknesses: The authors acknowledge that ambiguous human pose has been considered before (Lines 34-36). They claim to be the first to look at full meshes. This is both a bit narrow and probably not true. Certainly the papers I cite below used meshes, just not learned body models like SMPL. I think these lines should be replaced by a clearer statement of the contribution.
Review for NeurIPS paper: 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
Methodology is interesting, but some of the claims in the paper are questionable and multiple references are missing. In the final version, please include the references indicated by R1 and R2 as well as the clarifications requested during reviewing. In particular please correctly relate to prior normalizing flow work to build kinematic priors, published at CVPR 2020 and ECCV2020, and currently not cited, as indicated by R2.
3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
We consider the problem of obtaining dense 3D reconstructions of deformable objects from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we aim at recovering several plausible reconstructions compatible with the input data. We suggest that ambiguities can be modeled more effectively by parametrizing the possible body shapes and poses via a suitable 3D model, such as SMPL for humans. We propose to learn a multi-hypothesis neural network regressor using a best-of-M loss, where each of the M hypotheses is constrained to lie on a manifold of plausible human poses by means of a generative model. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans, and in heavily occluded versions of these benchmarks.